-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hide table layouts from engine #363
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment, but otherwise looks good.
// Used during predicate refinement over multiple passes of predicate pushdown | ||
// TODO: think about how to get rid of this in new planner | ||
private final TupleDomain<ColumnHandle> currentConstraint; | ||
|
||
private final TupleDomain<ColumnHandle> enforcedConstraint; | ||
|
||
public static TableScanNode newInstance( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a strange design, which I think predates this change. Maybe add a comment on why this factory is different from the constructor with the same signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks for pointing it out. I need to see if I can get rid of it. Without it, two of the constructors had the same shape (but ended up doing different things). I may be able to consolidate them now that we don’t have table layouts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out I can't. I tried making the static factory method the one that's used to deserialized from json, but it needs to set some fields to null, which is disallowed by the full constructor. Oh well.. I added a comment to clarify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we used to have such constructor and it used to work. What was wrong with it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please refresh my memory and explain why TableLayout*
were introduced in the first place and why it no longer holds?
|
||
import static java.util.Objects.requireNonNull; | ||
|
||
public final class TableHandle | ||
{ | ||
private final ConnectorId connectorId; | ||
private final ConnectorTableHandle connectorHandle; | ||
private final ConnectorTransactionHandle transaction; | ||
private final Optional<ConnectorTableLayoutHandle> layout; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me it looks like to much of information to be stored in TableHandle
. To me TableHandle
represents an handle to a table, same way like a column handle. It is expected to be lightweight. What you have here is designed for very specific use case, hence maybe a new entity should be created for that.
Notice that you are planning to put even more information here (aggregations, joins, projections), the more you add here the less "table"-like it becomes, instead it becomes a more generic "relation" (no longer a table). Also notice that TableHandle
is used for insert (perfect fit usage) and now with your changes it becomes questionable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me it looks like to much of information to be stored in TableHandle. To me TableHandle represents an handle to a table, same way like a column handle. It is expected to be lightweight. What you have here is designed for very specific use case, hence maybe a new entity should be created for that.
Storing the TableLayout is only temporary. I plan to eventually get rid of them entirely, but didn't do so yet because it requires backward incompatible changes to the SPI.
But at the end of the day, yes, the plan is for TableHandle to potentially represent more than raw tables. The object doesn't need to be heavyweight -- it's up to the connector to decide what it holds on to internally (it could be just an id to an stored definition, etc). As to whether it becomes more "relation"-like, that's not much of a concern. In fact, the SQL spec treats everything as some form of a table:
A table is a collection of zero or more rows where each row is a sequence of one or more column values.
[...]
A table is either a base table, a derived table, or a transient table.
[...]
A derived table is a table derived directly or indirectly from one or more other tables by the evaluation of an expression,
such as a <joined table>, <data change delta table>, <query expression>, or <table expression>. A <query expression>
can contain an optional <order by clause>
Also notice that TableHandle is used for insert (perfect fit usage) and now with your changes it becomes questionable.
That's ok, because whether a table can be inserted into is decided at analysis time. So you'd never end up with a TableHandle to insert into that represents something that can't be inserted into. BTW, did you know the SQL spec allows for inserting into things that are more complex that simple tables? For example, there's an entire section describing "updatable views".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point. Anyway, I still think that TableHandle
was a simple wrapper around ConnectorTableHandle
and there are still places where it is used as such. Now, they will get a complex type, so they would need to at least check if the layout is empty there.
Storing the TableLayout is only temporary. I plan to eventually get rid of them entirely, but didn't do so yet because it requires backward incompatible changes to the SPI.
But there will be something else that will replace TableLayout
. So TableHandle
won't become any simpler.
So you'd never end up with a TableHandle to insert into that represents something that can't be inserted into.
Having separate a type would make expressive in the code that such case is impossible (without knowing the actual flow).
BTW, did you know the SQL spec allows for inserting into things that are more complex that simple tables?
Mind blowing. Anyway, I don't think that TableHandle
that is suited for read will match that.
It's a feature we added a few years ago for a very specific use case at FB to allow connectors to provide multiple physical organizations of a table for the engine to use during optimization. No one else is using it. Since it makes it hard to introduce new features like complex operation pushdown, we're removing it for now. We may reintroduce it later in some other shape (materialized query tables/materialized views, etc). |
Would you mind to sort commits, that way it will be easier to review. |
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for all commits but Hide TableLayouts from engine
|
||
import static java.util.Objects.requireNonNull; | ||
|
||
public final class TableHandle | ||
{ | ||
private final ConnectorId connectorId; | ||
private final ConnectorTableHandle connectorHandle; | ||
private final ConnectorTransactionHandle transaction; | ||
private final Optional<ConnectorTableLayoutHandle> layout; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point. Anyway, I still think that TableHandle
was a simple wrapper around ConnectorTableHandle
and there are still places where it is used as such. Now, they will get a complex type, so they would need to at least check if the layout is empty there.
Storing the TableLayout is only temporary. I plan to eventually get rid of them entirely, but didn't do so yet because it requires backward incompatible changes to the SPI.
But there will be something else that will replace TableLayout
. So TableHandle
won't become any simpler.
So you'd never end up with a TableHandle to insert into that represents something that can't be inserted into.
Having separate a type would make expressive in the code that such case is impossible (without knowing the actual flow).
BTW, did you know the SQL spec allows for inserting into things that are more complex that simple tables?
Mind blowing. Anyway, I don't think that TableHandle
that is suited for read will match that.
@@ -29,15 +29,22 @@ | |||
|
|||
public class TableLayoutResult | |||
{ | |||
private final TableHandle newTableHandle; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why newTableHanlde
and not just tableHandle
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just to indicate that this is the new table handle to be used to refer to the result of pushing the predicates into the table scan. It's named that way for clarity, but it doesn't matter in the medium term since I'm planning to remove this class altogether once the replacement from https://github.com/prestosql/presto/wiki/Pushdown-of-complex-operations is in place and we give connectors some time to migrate.
{ | ||
this.newTableHandle = newTable; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rnn
@@ -90,20 +90,14 @@ public PickTableLayout(Metadata metadata, SqlParser parser) | |||
{ | |||
return ImmutableSet.of( | |||
checkRulesAreFiredBeforeAddExchangesRule(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this still required? I would replace with comment in PlanOptimizers
. There is no reason to prevent this class to be invoked later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not. I'll remove it.
...o-main/src/main/java/io/prestosql/sql/planner/iterative/rule/PushPredicateIntoTableScan.java
Show resolved
Hide resolved
// Used during predicate refinement over multiple passes of predicate pushdown | ||
// TODO: think about how to get rid of this in new planner | ||
private final TupleDomain<ColumnHandle> currentConstraint; | ||
|
||
private final TupleDomain<ColumnHandle> enforcedConstraint; | ||
|
||
public static TableScanNode newInstance( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we used to have such constructor and it used to work. What was wrong with it?
Before, we had two constructors: @JsonCreator
public TableScanNode(
@JsonProperty("id") PlanNodeId id,
@JsonProperty("table") TableHandle table,
@JsonProperty("outputSymbols") List<Symbol> outputs,
@JsonProperty("assignments") Map<Symbol, ColumnHandle> assignments,
@JsonProperty("layout") Optional<TableLayoutHandle> tableLayout) and public TableScanNode(
PlanNodeId id,
TableHandle table,
List<Symbol> outputs,
Map<Symbol, ColumnHandle> assignments) After removing table layouts, both constructors have the same signature, but do something different. |
It will still be a simple wrapper around
No, there won't. It will be just TableHandles wrapping ConnectorTableHandles. It's up to connectors to decide what they need to track in a ConnectorTableHandle based on what can be pushed down into the connector.
That's really hard to do in general. It forces entire parallel hierarchies and APIs to represent similar objects with different constraints (e.g., we'd need separate APIs to resolve a table during analysis for each possible use so we can get a table handle for insert, a table handle for delete, etc.). Also, where do we stop? Would we have one of each expression kind for each SQL type to prevent misuse? It's a fine balance, and I think the current concepts provide good separation between action (expression/plan node type), object (columns, tables, arguments) and attributes (physical or logical properties) without requiring a NxMxR explosion of valid combinations. |
858edcb
to
6773bb3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well done!
presto-main/src/main/java/io/prestosql/sql/planner/optimizations/MetadataQueryOptimizer.java
Outdated
Show resolved
Hide resolved
@@ -25,15 +25,22 @@ | |||
|
|||
public class TableLayoutResult | |||
{ | |||
private final TableHandle newTableHandle; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this field since we still have layout
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Layout doesn't have a reference to the ConnectorTableHandle that's needed to construct the new TableHandle. I could add that, but it seemed conceptually cleaner to do it this way.
presto-main/src/main/java/io/prestosql/sql/planner/optimizations/HashGenerationOptimizer.java
Show resolved
Hide resolved
presto-main/src/test/java/io/prestosql/operator/BenchmarkScanFilterAndProjectOperator.java
Show resolved
Hide resolved
More accurately, constructs a set of alternate plans with the filter pushed into the table scan based on available table layouts.
@@ -144,12 +145,14 @@ public static RowExpression translate( | |||
FunctionRegistry functionRegistry, | |||
TypeManager typeManager, | |||
Session session, | |||
boolean optimize) | |||
boolean optimize, | |||
Map<Symbol, Integer> layout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd put this below type
} | ||
}); | ||
|
||
Block block = Utils.nativeValueToBlock(expectedType, result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comment // convert result from stack type to Type ObjectValue
This test relies on a layout being picked during planning. For LocalQueryRunner this only happens because of a synthetic rule (PickLayout w/o predicate) that attaches a layout to the table scan. So, in a sense, it's just a coincidence that it works. In distributed execution, the job is done by AddExchanges, so we want to make sure we're testing that behavior.
It's just selecting a layout for the raw table scan, so no need to go through the logic for pushing a predicate, etc.
2eebec6
to
f10d5f2
Compare
This change hides table layouts from the engine as a first-class concept. We keep the SPI as is for backward compatibility for now. When predicates are pushed into a table scan by PickLayout (now PushPredicateIntoTableScan) or AddExchanges, we now replace the table handle associated with the table scan with a new one that contains the reference to the ConnectorTableLayoutHandle under the covers.
It now contains a single rule, so no point in having it return a rule set.
In order to translate expression to row expressions, the code was first replacing all symbol references with field references for the corresponding ordinal inputs. This is unnecessary, as the translation can be done on demand as the expression is translated to a row expression.
The inferred type of the former expression is INTEGER, which doesn't match the signature of combineHash function call.
They were only being used in tests. The engine no longer relies on them for query execution.
There's no longer a conflict with analyzeExpressionsWithInputs so simplify the name
There's only one caller, so no need for an extra indirection.
@@ -252,10 +251,10 @@ public Object evaluate() | |||
return visitor.process(expression, new NoPagePositionContext()); | |||
} | |||
|
|||
public Object evaluate(int position, Page page) | |||
public Object evaluate(SymbolResolver inputs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference between this evaluate
and the optimize
now? If you provide SymbolResolver
, using optimize
would be just fine, right?
In preparation for https://github.com/prestosql/presto/wiki/Pushdown-of-complex-operations, this change removes support for multiple layouts. It's a feature that's not used by any known connector, complicates the work of the optimizer and just gets in the way. In the future, we may add this functionality again but in a different form.